CHAPTER 5 

Reliability of Physical Findings 


KEY TEACHING POINTS 

Reliability refers to how often two clinicians examining the same patient agree 
about the presence or absence of a particular physical finding. Commonly used 
measurements of reliability are simple agreement or the kappa (k-) statistic. 
About 60% of physical findings have K-statistics of 0.4 or more, indicating that 
observed agreement is moderately good or better. 

Despite the common belief that technologic tests are more precise than bedside 
observation, the K-statistics observed for most diagnostic standards (e.g., chest 
radiography, computed tomography, angiography, magnetic resonance imaging, 
endoscopy, and pathology) are similar to those observed for physical signs. 

Some causes of interobserver disagreement can be eliminated, but because clinical 
medicine is inherently a human enterprise (even when interpreting technologic 
tests), subjectivity and a certain level of clinical disagreement will always be present. 


Reliability refers to how often multiple clinicians, examining the same patients, agree 
that a particular physical sign is present or absent. As characteristics of a physical 
sign, reliability and accuracy are distinct qualities, although significant interobserver 
disagreement tends to undermine the finding’s accuracy and prevents clinicians from 
applying it confidently to their own practice. Disagreement about physical signs also 
contributes to the growing sense among clinicians, not necessarily justified, that phys¬ 
ical examination is less scientific than more technologic tests, such as clinical imaging 
and laboratory testing, and that physical examination lacks their diagnostic authority. 

The most straightforward way to express reliability, or interobserver agreement, 
is simple agreement, which is the proportion of total observations in which clini¬ 
cians agree about the finding. For example, if two clinicians examining 100 patients 
with dyspnea agree that a third heart sound is present in 5 patients and absent in 75 
patients, simple agreement would be 80% (i.e., [5 + 75]/l00 = 0.80); in the remain¬ 
ing 20 patients, only one of the two clinicians heard a third heart sound. Simple 
agreement has advantages, including being easy to calculate and understand, but 
a significant disadvantage is that agreement may be quite high by chance alone. 
For example, if one of the clinicians in our hypothetical study heard a third heart 
sound in 10 of the 100 dyspneic patients and the other heard it in 20 of the patients 
(even though they agreed about the presence of the heart sound in only 5 patients), 
simple agreement by chance alone would be 74%.* With chance agreement this 
high, the observed 80% agreement no longer seems so impressive. 


* Agreement by chance approaches 100% as the percentage of positive observations for both 
clinicians approaches 0% or 100% (i.e., both clinicians agree that a finding is very uncommon 
or very common). The Appendix at the end of this chapter shows how to calculate chance 
agreement. 
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28 PART 2 UNDERSTANDING THE EVIDENCE 


To address this problem, most clinical studies now express interobserver agree¬ 
ment using the kappa (k-) statistic, which usually has values between 0 and 1. 
(The Appendix at the end of this chapter shows how to calculate the K-statistic.) 
A K-value of 0 indicates that observed agreement is the same as that expected by 
chance, and a K-value of 1 indicates perfect agreement. According to convention, 
a K-value of 0 to 0.2 indicates slight agreement; 0.2 to 0.4 fair agreement; 0.4 to 0.6 
moderate agreement; 0.6 to 0.8 substantial agreement; and 0.8 to 1.0 almost perfect 
agreement. ’ Rarely, physical signs have K-values less than 0 (theoretically as low as 
-1), indicating the observed agreement was worse than chance agreement. 

Table 5.1 presents the K-statistic for most of the physical signs discussed in 
this book, demonstrating that, with rare exceptions, observed agreement is better 
than chance agreement (i.e., K-statistic exceeds 0). About 60% of findings have a 
K-statistic of 0.4 or more, indicating that observed agreement is moderate or better. 

Clinical disagreement occurs for many reasons—some causes clinicians can con¬ 
trol, but others are inextricably linked to the very nature of clinical medicine and 
human observation in general. The most prominent reasons include the following: 
First, the physical sign’s definition can be vague or ambiguous. For example, experts 
recommend about a dozen different ways to perform auscultatory percussion of the 
liver, thus making the sign so nebulous that significant interobserver disagreement 
is guaranteed. Ambiguity also results if signs are defined with terms that are not 
easily measurable. For example, clinicians assessing whether a peripheral pulse is 
present or absent demonstrate moderate-to-almost perfect agreement (k = 0.52 - 
0.92, see Table 5.1), but when the same clinicians are asked to record whether the 
palpable pulse is normal or diminished, they have great difficulty agreeing about 
the sign (k = 0.01 - 0.15) simply because they have no idea what the next cli¬ 
nician means by “diminished.” Second, the clinician’s technique may be flawed. 
For example, common mistakes are using the diaphragm instead of the bell of the 
stethoscope to detect the third heart sound, or stating a muscle stretch reflex is 
absent without first trying to elicit it using a reinforcing maneuver (e.g., Jendrassik 
maneuver). A third reason for clinical disagreement involves biologic variation of 
the physical sign. The pericardial friction rub, pulsus altemans, cannon A waves, 
Cheyne-Stokes respirations, and many other signs are notoriously evanescent, 
tending to come and go over time. Fourth, the clinician could be careless or inat¬ 
tentive. The bustle of an active practice may lead clinicians to listen to the lungs 
while conducting the patient interview or to search for a subtle murmur in a noisy 
emergency room. Reliable observations require undistracted attention and an alert 
mind. Lastly, the clinician’s biases can influence the observation. When findings are 
equivocal, expectations influence perceptions. For example, in a patient who just 
started blood pressure medications, borderline hypertension may become normal 
blood pressure; in a patient with increasing bilateral edema, borderline distended 
neck veins may become clearly elevated venous pressure, or in a patient with new 
weakness, the equivocal Babinski sign may become clearly positive. Sometimes, 
biases actually create the finding: if the clinician holds a flashlight too long over 
an eye with suspected optic nerve disease, he may temporarily bleach the retina of 
that eye and produce a Marcus Gunn pupil, thus confirming the original suspicion. 

The lack of perfect reliability with physical diagnosis is sometimes regarded as 
a significant weakness, leading to the charge that physical diagnosis is less reli¬ 
able and scientific than clinical imaging and laboratory testing. Nonetheless, 


tNo measure of reliability is perfect, especially for findings whose prevalence clinicians agree 
approaches 0% or 100%. For these findings, simple agreement tends to overestimate reliabil¬ 
ity, and the K-statistic tends to underestimate the reliability. 
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CHAPTER5 RELIABILITY OF PHYSICAL FINDINGS 29 


TABLE 5.1 Interobserver Agreement and Physical Signs 

Finding (ref) 

K-Statistic 

GENERAL APPEARANCE 

Mental Status Examination 


Mini-mental status examination 1 

0.28-0.80 

Clock-drawing test (Wolf-Klein method) 2 

0.73 

Confusion assessment method for delirium 3 6 

0.70-0.91 

Altered mental status 7 

0.71 

Stance and Gait 


Abnormal gait 8 9 

0.1 1-0.71 

Skin 


Patient appears anemic 1011 

0.23-0.48 

Nailbed pallor 12 

0.19-0.34 

Conjunctival pallor (rim method) 13 

0.54-0.75 

Ashen or pale skin 7 

0.34 

Cyanosis 1014 

0.36-0.70 

Jaundice 15 

0.65 

Loss of hair 16 

0.51 

Vascular spiders 15 ' 17 

0.64-0.92 

Palmar erythema 1517 

0.37-1.00 

Hydration Status 


Patient appears dehydrated 10 

0.44-0.53 

Axillary dryness 18 

0.50 

Increased moisture on skin 10 

0.31-0.53 

Capillary refill >3 s 7 

0.29 

Capillary refill >5 s 19 

0.74-0.91 

Nutritional Assessment 


Abnormal nutritional state 10 

0.27-0.36 

Other 


Consciousness impaired 10 

0.65-0.88 

Patient appears older than age 10 

0.38-0.42 

Patient appears in pain 10 

0.43-0.75 

Generally unwell in appearance 10 

0.52-0.64 

VITAL SIGNS 


Tachycardia (heart rate > IOO/min) 20 

0.85 

Bradycardia (heart rate <60/min) 20 

0.87 

Systolic hypertension (SBP > 160 mmHg) 20 

0.75 

Hypotension (SBP <90 mmHg) 20,21 

0.27-0.90 

Osier sign 22 ' 24 

0.26-0.72 

Rumpel-Leede (tourniquet) test 25,26 

0.76-0.88 

Elevated body temperature, palpating the skin 10 

0.09-0.23 

Tachypnea 7 14,20 

0.25-0.60 


Continued 
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30 PART 2 UNDERSTANDING THE EVIDENCE 


TABLE 5.1 Interobserver Agreement and Physical Signs—cont’d 

Finding (ref) 

K-Statistic 

HEAD AND NECK 

Pupils 


Swinging flashlight test (relative afferent pupil defect) 27 

0.63 

Diabetic Retinopathy 


Microaneurysms 28,29 

0.58-0.66 

Intraretinal hemorrhages 28,29 

0.89 

Hard exudates 28 29 

0.66-0.74 

Cotton wool spots 28,29 

0.56-0.67 

Intraretinal microvascular abnormalities (IRMA) 28,29 

0.46 

Neovascularization near disc 28,29 

0.21-0.48 

Macular edema 28,29 

0.21-0.67 

Overall grade 28,29 

0.65 

Hearing 


Whispered voice test 30 

0.16-1.0 

Finger rub test 31 

0.83 

Thyroid 


Thyroid gland diffuse, multinodular or solitary nodule 32 

0.25-0.70 

Goiter 33,34 

0.38-0.77 

Meninges 


Nuchal rigidity, present or absent 35 37 

0.24-0.76 

LUNGS 


Inspection 


Clubbing 14,38 (general impression) 

0.33-0.45 

Clubbing (interphalangeal depth ratio) 39 

0.98 

Clubbing (Schramroth sign) 39 

0.64 

Breathing difficulties 10 

0.54-0.69 

Gasping respirations 7 

0.63 

Reduced chest movement 14,40,41 

0.14-0.38 

Kussmaul respirations 42 

0.70 

Pursed lip breathing 41 

0.45 

Asymmetric chest expansion 43 

0.85 

Scalene or sternocleidomastoid muscle contraction 7,41,44 

0.52-0.57 

Kyphosis 38 

0.37 

Barrel chest 41 

0.62 

Thoracic ratio >0.9 41 

0.32 

Displaced trachea 14 

0.01 

Palpation 


Tracheal descent during inspiration 44 

0.62 

Laryngeal height <5.5 cm 41 

0.59 

Impalpable apex beat 14,38 

0.33-0.44 

Decreased tactile fremitus 14,43 

0.24-0.86 

Increased tactile fremitus 14 

0.01 

Subxiphoid point of maximal cardiac impulse 45 

0.30 
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CHAPTER 5 RELIABILITY OF PHYSICAL FINDINGS 31 


TABLE 5.1 Interobserver Agreement and Physical Signs—cont’d 

Finding (ref) 

K-Statistic 

Paradoxical costal margin movement 44 46 

0.56-0.82 

Percussion 


Hyperresonant percussion note 14 40,45 

0.26-0.50 

Dull percussion note 14,40,43,47 

0.16-0.84 

Diaphragm excursion more or less than 2 cm, by percussion 45 

-0.04 

Diminished cardiac dullness 45 

0.49 

Auscultatory percussion abnormal 43 48 

0.18-0.76 

Auscultation 


Reduced breath sound intensity 14,40,41,43,45,47,49,50 

0.16-0.89 

Bronchial breathing 14,40 

0.19-0.32 

Whispering pectoriloquy 14 

0.1 1 

Reduced vocal resonance 43 

0.78 

Crackles 1 14.47.49.51-54 

0.21-0.65 

Wheezes 1 4 ' 45 - 4 7,49,so 

0.43-0.93 

Rhonchi 40 50 

0.38-0.55 

Pleural rub 14 43 

-0.02-0.51 

Special Tests 


Snider test < 10 cm 45 

0.39 

Forced expiratory time 41 ’ 45,55,56 

0.27-0.70 

Hoover sign 50 

0.74 

Wells simplified rule for pulmonary embolism 57 

0.54-0.62 

HEART 


Neck Veins 


Neck veins, elevated or normal 51 53,58 

0.08-0.71 

Abdominojugular test 58 

0.92 

Palpation 


Palpable apical impulse present 59 61 

0.68-0.82 

Palpable apical impulse measureable 62 

0.56 

Palpable apical impulse displaced lateral to midclavicular 

0.43-0.86 

line 51,59,60,63 


Apical beat normal, sustained, double, or absent 63 

0.88 

Percussion 


Cardiac dullness > 10.5 cm from midsternal line 64,65 

0.57 

Auscultation 


S2 diminished or absent, vs. normal 66 

0.54 

Third heart sound 51 ' 53, 58,67-69 

-0.17-0.84 

Fourth heart sound 68,70 

0.15-0.71 

Systolic murmur, present or absent 66 

0.19 

Systolic murmur radiates to right carotid 66 

0.33 

Systolic murmur, long systolic or early systolic 71 

0.78 

Murmur intensity (Levine grade) 72 

0.43-0.60 

Systolic murmur grade >2/6 73 

0.59 


Continued 
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TABLE 5.1 Interobserver Agreement and Physical Signs—cont’d 

Finding (ref) 

K-Statistic 

Carotid Pulsation 


Delayed carotid upstroke 66 

0.26 

Reduced carotid volume 66 

0.24 

ABDOMEN 


Inspection 


Abdominal distension 74,75 

0.35-0.42 

Abdominal wall collateral veins, present vs. absent 15 

0.47 

Palpation and Percussion 


Ascites 15,17,53 

0.47-0.75 

Abdominal tenderness 74 ' 76 

0.31-0.68 

Surgical abdomen 75 

0.27 

Abdominal wall tenderness test 77,78 

0.52-0.81 

Rebound tenderness 74 

0.25 

Guarding 74 75 

0.36-0.49 

Rigidity 74 

0.14 

Abdominal mass palpated 75 

0.82 

Palpable spleen 15,17 

0.33-0.75 

Palpable liver edge 79 

0.44-0.53 

Liver consistency, normal or abnormal 15 

0.4 

Liver firm to palpation 80 

0.72 

Liver, nodular or not 15 

0.29 

Liver, tender or not 17 

0.49 

Liver, span >9 cm by percussion 51 

0.1 1 

Spleen palpable or not 81 

0.56-0.70 

Spleen percussion sign (Traube), positive or not 82 

0.19-0.41 

Abdominal aortic aneurysm, present vs. absent 83 

0.53 

Auscultation 


Normal bowel sounds 75 

0.36 

EXTREMITIES 


Peripheral Vascular Disease 


Peripheral pulse, present vs. absent 84,85 

0.52-0.92 

Peripheral pulse, normal or diminished 84 

0.01-0.15 

Cool extremities 53 

0.46 

Severity of skin mottling over leg 86,87 

0.87 

Diabetic Foot 


Monofilament sensation, normal or abnormal 88 90 

0.48-0.83 

Probe-to-bone test 91 93 

0.59-0.84 

Edema and Deep Venous Thrombosis 


Dependent edema 51 53 

0.39-0.73 

Well pre-test probability for DVT 94,95 

0.74-0.75 

Musculoskeletal System—Shoulder 


Shoulder tenderness 96 

0.32 

Painful arc 96 ' 99 

0.45-0.64 
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TABLE 5.1 Interobserver Agreement and Physical Signs—cont’d 

Finding (ref) 

K-Statistic 

External rotation of shoulder <45 degrees 96 

0.68 

Supraspinatus test (empty can) 96,99100 

0.44-0.94 

Infraspinatus test (resisted external rotation) 96,97 

0.49-0.67 

Impingement sign (Hawkins-Kennedy) 96,97,99,100 

0.29-1.0 

Drop arm test 96,99 

0.28-0.35 

Musculoskeletal System—Hip 


Patrick’ test 101 

0.47 

Passive internal rotation <25 degrees 101 

0.51 

Musculoskeletal System—Knee 


Ottawa knee rules 102,103 

0.51-0.77 

Knee effusion visible 102,104,105 

0.28-0.59 

Knee flexion <90 degrees 102 

0.74 

Patellar tenderness 102,104 

0.69-0.76 

Head of fibula tenderness 102 

0.64 

Inability to bear weight immediately and emergency room 

0.75-0.81 

after knee injury 102,104 


Bony swelling of knee 106 

0.55 

Joint line tenderness 105-108 

0.11-0.43 

Patellofemoral crepitus 106 

0.24 

Mediolateral instability of knee 106 

0.23 

McMurraysign 105 ' 08 ' 09 

0.16-0.35 

Musculoskeletal System—Ankle 


Inability to walk 4 steps immediately and in emergency room 

0.71-0.97 

after ankle injury 1 10,111 


Medial malleolar tenderness 1 11 

0.82 

Lateral malleolar tenderness 111 

0.80 

Navicular tenderness 111 

0.91 

Base of 5th metatarsal tenderness 11! 

0.94 

Ottawa ankle rule 1 12 

0.41 

Ottawa midfoot rule 112 

0.77 

NEUROLOGIC EXAMINATION 


Visual Fields 


Visual fields by confrontation 1 13 

0.63-0.81 

Cranial Nerves 


Pharyngeal sensation, present or absent 1 14 

1.0 

Facial palsy, present or absent 1 15,116 

0.57 

Dysarthria, present or absent 1 17,118 

0.41-0.77 

Water swallow test (50 mL) 1 19 

0.60 

Oxygen desaturation test (for aspiration risk) 1 19 

0.60 

Abnormal tongue strength 117 

0.55-0.63 

Motor Examination 


Muscle strength, Medical Research Council (MRC) scale 120-123 

0.69-0.93 

Foot tapping test 124 

0.73 

Muscle atrophy 125,126 

0.32-0.82 


Continued 


Downloaded for Ahmed Othman (aothman@kockw.com) at Kuwait Oil Company from ClinicalKey.com by Elsevier on 
December 07, 2017. For personal use only. No other uses without permission. Copyright ©2017. Elsevier Inc. All rights 

reserved. 






34 PART 2 UNDERSTANDING THE EVIDENCE 


TABLE 5.1 Interobserver Agreement and Physical Signs—cont’d 

Finding (ref) 

K-Statistic 

Spasticity, 6 point scale 127 

0.21-0.61 

Rigidity, 4 point scale 128 

0.64 

Asterixis 1 5 

0.42 

Tremor 126 

0.74 

Pronator drift 129 

0.39 

Forearm rolling test 129 

0.73 

Sensory Examination 


Light touch sensation, normal, diminished, or increased 125126 

0.22-0.63 

Pain sensation, normal, diminished, or increased 121,125126 

0.41-0.57 

Vibratory sensation, normal or diminished 125,126 

0.28-0.54 

Romberg test 126 

0.64 

Reflex Examination 


Reflex amplitude, National Institute of Neurological Disorders 

0.51-0.61 

and Stroke (NINDS) scale 130 


Ankle jerk, present or absent 121 ■ 1 31 • 132 

0.34-0.94 

Asymmetric knee jerk 121 

0.42 

Babinski response 11 5 - 1 1 6, 1 24, 126 , 1 33, 1 34 

0.17-0.60 

Finger flexion reflex 135 

0.65 

Primitive reflexes, amplitude and persistence 136 

0.46-1.0 

Coordination 


Finger-nose test 1 ' 5, 11 6, 126 , 1 29 

0.14-0.65 

Heel-shin test 126 

0.58 

Peripheral Nerve 


Spurling test 137 

0.60 

Katz hand diagram 138 

0.86 

Flick sign 139 

0.90 

Hypalgesia index finger 139 

0.50 

Tinel sign 139 

0.47 

Phalen sign 139 

0.79 

Straight-leg raising test 121,140-144 

0.21-0.80 

Crossed-leg raising test 121 

0.49 


interpretation of the K-statistic 0 to 0.2 slight agreement, 0.2 to 0.4 fair agreement, 0.4 to 0.6 
moderate agreement, 0.6 to 0.8 substantial agreement, 0.8 to 1.0 almost perfect agreement. 


Table 5.2 shows that, for most of our diagnostic standards—chest radiography, 
computed tomography, screening mammography, angiography, magnetic resonance 
imaging, ultrasonography, endoscopy, and pathology—interobserver agreement is 
also less than perfect, with K-statistics similar to those observed with physical signs. 
Even with laboratory tests, which present the clinician with a single, indisputable 
number, interobserver disagreement is still possible and even common, simply 
because the clinician has to interpret the laboratory test’s significance. For example, 
in one study of three endocrinologists reviewing the same thyroid function tests and 
other clinical data of 55 consecutive outpatients with suspected thyroid disease, 
the endocrinologists disagreed about the final diagnosis 40% of the time. 32 The 
computerized interpretation of test results performs no better: in a study of pairs 
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TABLE 5.2 Interobserver Agreement: Diagnostic Standards 


Finding (ref) K-Statistic 


CHEST RADIOGRAPHY 


Cardiomegaly 58 

Pulmonary infiltrate 145 

Pneumonia 146 

Interstitial edema 58 

Pulmonary vascular redistribution 58 

Grading pulmonary fibrosis, 4 point scale 147 

CONTRAST VENOGRAPHY 

0.48 

0.38 

0.45 

0.83 

0.50 

0.45 

Deep vein thrombosis in leg 148 

SCREENING MAMMOGRAPHY 

0.53 

Suspicious lesion, present vs. absent 149 

DIGITAL SUBTRACTION ANGIOGRAPHY 

0.47 

Renal artery stenosis 150 

CORONARY ARTERIOGRAPHY 

0.65 

Classification of coronary artery lesions 151 

ARTHROSCOPY 

0.33 

Inflamed or torn supraspinatus tendon 152 

COMPUTED TOMOGRAPHY OF HEAD 

0.47 

Normal or abnormal, patient with stroke 153 

Lesion on right or left side, patient with stroke 153 

Mass effect, present or absent 153 

COMPUTED TOMOGRAPHY OF THE CHEST 

0.60 

0.65 

0.52 

Lung cancer staging 154 

Submassive pulmonary embolism present (angiography) 155 
Coronary lesion on CT coronary angiography 156 

MAGNETIC RESONANCE IMAGING OF HEAD 

0.40-0.60 

0.47 

0.57 

Compatible with multiple sclerosis 157 

0.57-0.87 


Pituitary microadenoma present 158 0.30 

MAGNETIC RESONANCE IMAGING OF LUMBAR SPINE 


Intervertebral disc extrusion, protrusion, bulge, or normal 159160 
Lumbar nerve root compression 160161 

ULTRASONOGRAPHY 

0.59 

0.63-0.83 

Calf deep vein thrombosis, present or absent 162 

Thyroid nodule, present or absent 163164 

Thyroid nodule, cystic or solid 165 

Goiter is present 34 

ELECTROCARDIOGRAPHY 

0.69 

0.57-0.66 

0.64 

0.63 

Diagnosis of narrow-complex tachycardia 166 

ECHOCARDIOGRAPHY 

0.70 

Severity of valvular regurgitation 167,168 

ENDOSCOPY 

0.32-0.55 

Grade of reflux esophagitis 169 

0.55 


Continued 
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TABLE 5.2 Interobserver Agreement: Diagnostic Standards—cont’d 
Finding (ref) K-Statistic 


PATHOLOGIC EXAMINATION OF LIVER BIOPSY _ 

Cholestasis 70 0.40 

Alcoholic liver disease 170 0.49 

Cirrhosis 170 0.59 

*1 interpretation of the K-statistic: 0 to 0.2 slight agreement, 0.2 to 0.4 fair agreement, 0.4 to 0.6 
moderate agreement, 0.6 to 0.8 substantial agreement, 0.8 to 1.0 almost perfect agreement. 


of electrocardiograms taken only 1 minute apart from 92 patients, the computer 
interpretation was significantly different 40% of the time, even though the tracings 
showed no change. 1 ' 1 

By defining abnormal findings precisely, by studying and mastering exami- 
nation technique, and by observing every detail at the bedside attentively 
and without bias or distraction, we can minimize interobserver disagreement 
and make physical diagnosis more precise. It is simply impossible, however, to 
abstract every detail of clinicians’ observations of patients into exact physical 
signs; in this way, physical diagnosis is no different from any of the other tools 
we use to categorize disease. So long as both the material and the observers of 
clinical medicine are human beings, a certain amount of subjectivity will always 
be with us. 


APPENDIX: CALCULATION OF THE 
k-STATISTIC 

The observations of two observers who are examining the same N patients indepen' 
dently are customarily displayed in a 2 x 2 table, similar to that in Fig. 5.1. Observer 
A finds the sign to be present in uq patients and absent in uq patients; observer B 
finds the sign to be present in yq patients and absent in y 2 patients. The two observ¬ 
ers agree the sign is present in a patients and absent in d patients. Therefore, the 
observed agreement (Pq) is 


p 0 = ( a + d)/N 

Calculation of the K-statistic first requires calculation of the agreement that 
would have occurred by chance alone. Among all the patients, observer A found 
the fraction uq/N to have the sign; therefore, by chance alone, among the yq 
patients with the sign according to observer B, observer A would find the sign in 
(uq/N) times yq or (uqyq/N) patients (i.e., this is the number of patients in which 
both observers agree the sign is present, by chance alone). Similarly, both observers 
would agree the sign is absent by chance alone in (uqyq/N) patients. Therefore, the 
expected chance agreement (Pg) is their sum, divided by N: 

P E = ( w 1 y 1 + w 2 y 2 ) /N 2 

This equation shows that agreement by chance alone (Pg) approaches 100% as 
both uq and yq approach 0 or N (i.e., both clinicians agree that a finding is rare or 
that it is very common). 


Downloaded for Ahmed Othman (aothman@kockw.com) at Kuwait Oil Company from ClinicalKey.com by Elsevier on 
December 07, 2017. For personal use only. No other uses without permission. Copyright ©2017. Elsevier Inc. All rights 

reserved. 







CHAPTER 5 RELIABILITY OF PHYSICAL FINDINGS 


37 


Observer B: 



Sign + 

Sign - 

Observer A: 



Sign + 

a 

b 

Sign - 

c 

d 


yi V2 N 


Sample problem: 


Observed agreement 


Chance agreement 



10 


90 


100 

FIG. 5.1 INTEROBSERVER AGREEMENT AND THE k-STATISTIC. Top half: Conven¬ 
tional 2x2 table displaying data for calculation of K-statistic. Bottom half: A sample case, in which 
observed agreement is 80%, chance agreement is 74%, and the K-statistic is 0.23 (see Appendix 
for discussion). 


The K-statistic is the increment in observed agreement beyond that expected 
by chance (Po — P e)> divided by the maximal increment that could have been 
observed had the observed agreement been perfect (1 - Pg): 

(Po-P e) 

(1-Pe) 

For example, Fig. 5.1 depicts the observations of two observers in a study of 100 
patients with dyspnea. Both agree the third heart sound is present in 5 patients and 
absent in 75 patients; therefore simple agreement is (5 + 75)/l00 or 0.80. By chance 
alone, they would have agreed about the sound being present in (10 X 20)/100 
patients (i.e., 2 patients) and absent in (90 x 80)/100 patients (i.e., 72 patients); 
therefore, chance agreement is (2 + 72)/100 patients or 0.74. The K-statistic for this 
finding becomes (0.80 — 0.74)/( 1 -0.74) = (0.06)/(0.26) = 0.23. 

The references for this chapter can be found on www.expertconsult.com. 
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